5 Primary hypotheses
5.1 Estimated probability of superiority
After seeing information about the medication along with a figure, we asked people to estimate the probability of superiority for the treatment over the control condition. Specifically, we asked “What is your best estimate of the probability that a randomly selected patient in the treatment group recovered more quickly than a randomly selected patient in the control group? (A 50% probability would indicate no difference in outcomes between the treatment and control groups.)”
5.1.1 Between-subjects effect of first visualization seen on first probability of superiority estimate
This looks at just the between-subjects effect that seeing SEs vs. SDs had by comparing people’s responses to only the first visualization condition they saw, ignoring the second estimates. For both medication types, people estimated the probability of superiority to be higher, on average, when they saw visualizations with SEs than with SDs.
5.1.1.1 Plots
This is plotted in two ways below, first means + one standard error, and then full distributions of responses. In the second type of plot the means and standard errors are indicated by points and horizontal error bars.
providers_estimated_psup <- plot_beeswarm_two_conditions_with_truth(df, true_effects, "first_condition", "first_superiority_estimate") +
facet_wrap(~ medication_type) +
labs(y="Estimated probability of superiority") +
xlab("First visualization seen")
## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
providers_estimated_psup
## Warning: In `position_beeswarm`, method `center` discretizes the data axis (a.k.a the continuous or non-grouped axis).
## This may result in changes to the position of the points along that axis, proportional to the value of `cex`.
## This warning is displayed once per session.
ggsave("figures/providers_estimated_psup.pdf", width=6, height=4)
summary_stats_providers <- df %>% group_by(medication_type, first_condition) %>%
summarize(mu=mean(first_superiority_estimate),
se=sd(first_superiority_estimate)/sqrt(n()))
## `summarise()` has grouped output by 'medication_type'. You can override using
## the `.groups` argument.
blood_se_first_mu <- (summary_stats_providers %>% filter(medication_type == "Blood pressure scenario" & first_condition == "Saw SEs first"))$mu
blood_sd_first_mu <- (summary_stats_providers %>% filter(medication_type == "Blood pressure scenario" & first_condition == "Saw SDs first"))$mu
covid_se_first_mu <- (summary_stats_providers %>% filter(medication_type == "COVID-19 scenario" & first_condition == "Saw SEs first"))$mu
covid_sd_first_mu <- (summary_stats_providers %>% filter(medication_type == "COVID-19 scenario" & first_condition == "Saw SDs first"))$mu
blood_psup_test <- (df %>%
filter(medication_type == "Blood pressure scenario") %>%
t.test(first_superiority_estimate ~ first_condition, data = .) %>%
apa_custom())$statistic %>%
remove_dollar_signs()
covid_psup_test <- (df %>%
filter(medication_type == "COVID-19 scenario") %>%
t.test(first_superiority_estimate ~ first_condition, data = .) %>%
apa_custom())$statistic %>%
remove_dollar_signs()
# Extreme values
blood_extreme_se_perc <- round(100*(df %>%
filter(medication_type == "Blood pressure scenario" & first_condition == "Saw SEs first") %>%
mutate(extreme = first_superiority_estimate > 90) %>%
summarize(prop_extreme = mean(extreme)))$prop_extreme)
blood_extreme_sd_perc <- round(100*(df %>%
filter(medication_type == "Blood pressure scenario" & first_condition == "Saw SDs first") %>%
mutate(extreme = first_superiority_estimate > 90) %>%
summarize(prop_extreme = mean(extreme)))$prop_extreme)
covid_extreme_se_perc <- round(100*(df %>%
filter(medication_type == "COVID-19 scenario" & first_condition == "Saw SEs first") %>%
mutate(extreme = first_superiority_estimate > 90) %>%
summarize(prop_extreme = mean(extreme)))$prop_extreme)
covid_extreme_sd_perc <- round(100*(df %>%
filter(medication_type == "COVID-19 scenario" & first_condition == "Saw SDs first") %>%
mutate(extreme = first_superiority_estimate > 90) %>%
summarize(prop_extreme = mean(extreme)))$prop_extreme)
blood_extreme_test <- (df %>%
filter(medication_type == "Blood pressure scenario") %>%
mutate(extreme = first_superiority_estimate > 90) %>%
t.test(extreme ~ first_condition, data=.) %>%
apa_custom())$statistic %>%
remove_dollar_signs()
covid_extreme_test <- (df %>%
filter(medication_type == "COVID-19 scenario") %>%
mutate(extreme = first_superiority_estimate > 90) %>%
t.test(extreme ~ first_condition, data=.) %>%
apa_custom())$statistic %>%
remove_dollar_signs()
c(blood_psup_test, covid_psup_test, blood_extreme_test, covid_extreme_test)
## [1] "t(65.1) = 6.83, p < .001" "t(77.9) = 6.35, p < .001"
## [3] "t(48.7) = 6.18, p < .001" "t(65.9) = 6.27, p < .001"
5.1.1.2 Significance tests
Our primary, confirmatory hypothesis is that participants will estimate a greater probability of superiority for the treatment when the results of a hypothetical RCT communicate inferential uncertainty (i.e., by displaying standard errors around a point estimate) than when these results communicate outcome uncertainty (i.e., by displaying standard deviations).
The first set of tests of this hypothesis is a between-subjects test using two separate, independent-groups t-tests. Specifically, within each of the two medical scenarios, we compare probability of superiority estimates between participants who see the inferential uncertainty condition first to participants who see the outcome uncertainty condition first.
df %>%
group_by(medication_type) %>%
do(tidy(t.test(first_superiority_estimate ~ first_condition,
data = .,
conf.level = 0.95,
alternative = c("greater")))) %>%
kable()
| medication_type | estimate | estimate1 | estimate2 | statistic | p.value | parameter | conf.low | conf.high | method | alternative |
|---|---|---|---|---|---|---|---|---|---|---|
| Blood pressure scenario | 22.62697 | 88.52941 | 65.90244 | 6.827420 | 0 | 65.09333 | 17.09701 | Inf | Welch Two Sample t-test | greater |
| COVID-19 scenario | 18.93269 | 85.93269 | 67.00000 | 6.346467 | 0 | 77.89890 | 13.96673 | Inf | Welch Two Sample t-test | greater |
We see a statistically significant difference: in both medical scenarios, those who saw SEs first gave higher estimates, on average, than those who saw SDs first. This supports our primary hypothesis.
5.1.1.3 Power calculation
Now a revised power calculation to see what we’d need to detect this effect using estimates from this experiment. We’re changing populations from Turkers to medical providers, so this should all be taken with a grain of salt, but hopefully an educated guess.
plot_data <- df %>%
mutate(direction = ifelse(superiority_estimate_sd < superiority_estimate_se, T, F)) %>%
select(worker_id, medication_type, first_condition, direction, first_superiority_estimate, second_superiority_estimate) %>%
group_by(medication_type, first_condition) %>%
summarize(mu = mean(first_superiority_estimate),
sig = sd(first_superiority_estimate),
se = sd(first_superiority_estimate) / sqrt(n())) %>%
group_by(medication_type) %>%
mutate(y = 1:n())
## `summarise()` has grouped output by 'medication_type'. You can override using
## the `.groups` argument.
min_diff <- plot_data %>% group_by(medication_type) %>% summarize(delta = floor(abs(diff(mu)))) %>% pull(delta) %>% min
max_sd <- plot_data %>% pull(sig) %>% max %>% ceiling
We see differences of 9 and 8 percentage points for the blood pressure and COVID-19 scenarios here, with standard deviations of around 12 (when people see SDs) and 16 (when people see SEs, due to the extreme responses). Assuming the smallest difference we care about is 18 percentage points, and taking a pessimistic estimate of the standard deviation of probability of superiority estimates to be 16, let’s compute the sample size we’d need for a one-sided test with 95% power.
df %>%
group_by(medication_type) %>%
do(data.frame(sd_pooled = sd_pooled(first_superiority_estimate ~ first_condition, data = .))) %>%
kable()
| medication_type | sd_pooled |
|---|---|
| Blood pressure scenario | 14.07129 |
| COVID-19 scenario | 13.89203 |
(power_calc <- power.t.test(delta = min_diff,
sd = max_sd,
sig.level = 0.05,
alternative = "one.sided",
power = 0.95))
##
## Two-sample t test power calculation
##
## n = 17.82329
## delta = 18
## sd = 16
## sig.level = 0.05
## power = 0.95
## alternative = one.sided
##
## NOTE: n is number in *each* group
To detect an actual difference of at least 18 percentage points 95% of the time we’d need approximately 18 participants in each of the four conditions in the experiment for a total of approximately 75 participants.
5.1.2 Within-subjects effect of seeing one visualization vs. the other on probability of superiority
This considers both estimates that people made, comparing their estimated probability of superiority for the SD visualization to the same estimate for the SE visualization. This is broken down by which medication type they were shown and which visualization they saw first.
5.1.2.1 Plots
Regardless of which visualization people saw first, they estimated the probability of superiority to be higher when they saw SEs compared to SDs.
plot_data <- df %>%
mutate(diff_superiority_se_vs_sd = superiority_estimate_se - superiority_estimate_sd) %>%
group_by(medication_type) %>%
summarize(mu = mean(diff_superiority_se_vs_sd),
se = sd(diff_superiority_se_vs_sd) / sqrt(n())) %>%
mutate(y = 1:n())
kable(plot_data)
| medication_type | mu | se | y |
|---|---|---|---|
| Blood pressure scenario | 22.20000 | 2.356895 | 1 |
| COVID-19 scenario | 18.35795 | 2.021113 | 2 |
df %>%
mutate(diff_superiority_se_vs_sd = superiority_estimate_se - superiority_estimate_sd) %>%
ggplot(aes(x=medication_type, y=diff_superiority_se_vs_sd)) +
geom_point(mapping=aes(color=medication_type), position = position_jitter(width=0.1)) +
# scale_color_manual(values=c("#af8dc3", "#7fbf7b")) +
stat_summary(fun = mean, geom = "point", size=1.2) +
stat_summary(fun.data="mean_se", fun.args = list(mult=1),
geom="errorbar", color = "black", width=0.1) +
theme(legend.position = "none") +
geom_hline(yintercept = 0, linetype = 'dashed') +
xlab("Scenario") +
ylab("Drop in estimated probability of superiority\nwhen seeing SDs vs. SEs")
ggsave(here("analysis/figures/providers_within_subjects_drop_in_psup.pdf"), width=4, height=4)
within_subjects_diff_statistics <- df %>%
mutate(diff_superiority_se_vs_sd = superiority_estimate_se - superiority_estimate_sd) %>%
group_by(medication_type) %>%
summarize(
mu=mean(diff_superiority_se_vs_sd),
se=sd(diff_superiority_se_vs_sd)/sqrt(n())
)
blood_within_mu <- (within_subjects_diff_statistics %>% filter(medication_type == "Blood pressure scenario"))$mu
blood_within_se <- (within_subjects_diff_statistics %>% filter(medication_type == "Blood pressure scenario"))$se
covid_within_mu <- (within_subjects_diff_statistics %>% filter(medication_type == "COVID-19 scenario"))$mu
covid_within_se <- (within_subjects_diff_statistics %>% filter(medication_type == "COVID-19 scenario"))$se
ggplot(plot_data, aes(x = medication_type, y = mu)) +
geom_pointrange(aes(ymin = mu - se, ymax = mu + se)) +
geom_hline(yintercept = 0, linetype = 'dashed') +
#facet_wrap(~ medication_type, scale = "free_x") +
#scale_y_continuous(lim = c(0, 10)) +
labs(x = '',
y = 'Drop in estimated probability of superiority\nwhen seeing SDs vs. SEs',
title = 'Average drop in probability of superiority estimate when seeing SDs vs. SEs',
subtitle = 'Bars show one standard error') +
coord_flip()
scale <- 0.7
df %>%
mutate(diff_superiority_se_vs_sd = superiority_estimate_se - superiority_estimate_sd) %>%
ggplot(aes(x = diff_superiority_se_vs_sd, y = medication_type)) +
geom_density_ridges(stat = "binline", binwidth = 1, scale = scale) +
geom_errorbarh(data = plot_data, aes(x = mu, xmin = mu - se, xmax = mu + se, y = y + scale), height = 0.1) +
geom_point(data = plot_data, aes(x = mu, y = y + scale)) +
coord_cartesian(ylim = c(1.25, 2.25)) +
labs(x = 'Drop in estimated probability of superiority\nwhen seeing SDs vs. SEs',
y = 'First visualization seen',
title = 'Distribution of drop in probability of superiority estimate\nwhen seeing SDs vs. SEs') +
#facet_wrap(~ medication_type, ncol = 1) +
theme(legend.position = "none")
## Warning in geom_errorbarh(data = plot_data, aes(x = mu, xmin = mu - se, :
## Ignoring unknown aesthetics: x
We can also look at the full set of responses. For each person we plot their two estimates (one for when they saw SDs and one for when they saw SEs) connected by a line. The slope of the line indicates how much they changed their estimates, with a positive slope indicating that the estimate when seeing SEs was higher than when seeing SDs. Negatively-sloped lines are colored blue, and as hypothesized, the majority of people revised in this direction (somewhere between 60 and 70 percent of people, depending on the condition).
plot_data <- df %>%
select(worker_id, medication_type, first_condition,
superiority_estimate_sd_lower_than_se,
superiority_estimate_sd_equal_to_se,
superiority_estimate_sd_higher_than_se,
superiority_estimate_sd, superiority_estimate_se) %>%
gather("variable", "value", superiority_estimate_sd, superiority_estimate_se) %>%
mutate(variable = case_when(variable == 'superiority_estimate_sd' ~ 'When seeing SDs',
variable == 'superiority_estimate_se' ~ 'When seeing SEs'),
variable = factor(variable, levels = c('When seeing SEs', 'When seeing SDs')))
plot_data %>%
group_by(medication_type, first_condition) %>%
summarize(frac_sd_lower_than_se = mean(superiority_estimate_sd_lower_than_se),
frac_sd_equal_to_se = mean(superiority_estimate_sd_equal_to_se),
frac_sd_higher_than_se = mean(superiority_estimate_sd_higher_than_se)) %>%
kable()
## `summarise()` has grouped output by 'medication_type'. You can override using
## the `.groups` argument.
| medication_type | first_condition | frac_sd_lower_than_se | frac_sd_equal_to_se | frac_sd_higher_than_se |
|---|---|---|---|---|
| Blood pressure scenario | Saw SEs first | 0.8529412 | 0.1470588 | 0.0000000 |
| Blood pressure scenario | Saw SDs first | 0.6585366 | 0.2682927 | 0.0731707 |
| COVID-19 scenario | Saw SEs first | 0.7692308 | 0.1538462 | 0.0769231 |
| COVID-19 scenario | Saw SDs first | 0.6666667 | 0.1944444 | 0.1388889 |
ggplot(plot_data, aes(x = variable, y = value, group = worker_id, color = superiority_estimate_sd_lower_than_se)) +
geom_point() +
geom_line(alpha = 0.25) +
facet_grid(medication_type ~ first_condition) +
labs(x = '',
y = 'Estimated probability of superiority',
title = 'Change in probability of superiority estimates when seeing SEs vs. SDs') +
theme(legend.position = "none")
5.1.2.2 Significance tests
The second set of tests of our primary hypothesis is a within-subjects test using two separate, independent-groups t-tests. Specifically, within each of the two medical scenarios, we compute a within-participant change score as participants’ probability of superiority estimate in the inferential uncertainty condition minus their estimate in the outcome uncertainty condition. We test whether there is statistical significance for these change scores being greater than zero.
df %>%
mutate(diff_superiority_se_vs_sd = superiority_estimate_se - superiority_estimate_sd) %>%
group_by(medication_type) %>%
do(tidy(t.test(x = .$diff_superiority_se_vs_sd,
mu = 0,
conf.level = 0.95,
type = "two.sample",
alternative = c("greater")))) %>%
kable()
| medication_type | estimate | statistic | p.value | parameter | conf.low | conf.high | method | alternative |
|---|---|---|---|---|---|---|---|---|
| Blood pressure scenario | 22.20000 | 9.419171 | 0 | 74 | 18.27410 | Inf | One Sample t-test | greater |
| COVID-19 scenario | 18.35795 | 9.083092 | 0 | 87 | 14.99774 | Inf | One Sample t-test | greater |
We see a statistically significant difference: in both medical scenarios, we see that’s people’s estimates for the probability of superiority were higher, on average, when they made the estimate after seeing SEs compared to making the estimate after seeing SDs, leading to a positive change score. This supports our primary hypothesis.
An alternative is to just look at the fraction of people who had a lower estimate when they saw SDs compared to when they saw SEs, implemented as a sign test. This ignores the magnitude of changes and just considers direction.
df %>%
mutate(diff_superiority_se_vs_sd = superiority_estimate_se - superiority_estimate_sd) %>%
group_by(medication_type) %>%
do(tidy(DescTools::SignTest(.$diff_superiority_se_vs_sd, mu = 0)))
## # A tibble: 2 × 9
## # Groups: medication_type [2]
## medication_type estimate statistic p.value parameter conf.low.lwr.ci
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Blood pressure scenario 25 56 1.19e-13 59 15
## 2 COVID-19 scenario 20 64 2.38e-11 73 10
## # ℹ 3 more variables: conf.high.upr.ci <dbl>, method <chr>, alternative <chr>
Again we see a statistically significant difference for both scenarios.